Welcome to Data Handling 2023!

Background

‘Data Science’?

“This coupling of scientific discovery and practice involves the collection, management, processing, analysis, visualization, and interpretation of vast amounts of heterogeneous data associated with a diverse array of scientific, translational, and inter-disciplinary applications.”

University of Michigan ‘Data Science Initiative’, 2015

But, what about statistics?!

“Seemingly, statistics is being marginalized here; the implicit message is that statistics is a part of what goes on in data science but not a very big part. At the same time, many of the concrete descriptions of what the DSI will actually do will seem to statisticians to be bread-and-butter statistics. Statistics is apparently the word that dare not speak its name in connection with such an initiative!”

David Donoho (2015). 50 years of Data Science

What’s new about all this?

“All in all, I have come to feel that my central interest is in data analysis, which I take to include, among other things: …”

What’s new about all this?

“All in all, I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.”

What’s new about all this?

John Tukey (The Future of Data Analysis, 1962!)

Technological change

Relevance for modern economic research

Relevance for modern economic research

Relevance for modern economic research

Relevance for modern economic research

Data science in economics skill set

Data science as a life skill

Data science as a life skill

“More than anything, what data scientists do is make discoveries while swimming in data. … As they make discoveries, they communicate what they’ve learned and suggest its implications for new business directions. Often they are creative in displaying information visually and making the patterns they find clear and compelling

They advise executives and product managers on the implications of the data for products, processes, and decisions.

What kind of person does all this? Think of him or her as a hybrid of data hacker, analyst, communicator, and trusted adviser. The combination is extremely powerful — and rare.

Organisation of the Course

Our Team - At Your Service

Matthias Rösti Andrea Burro Aurélien Sallin

Introduction: Aurélien Sallin

  • 2022-today: Expert in Health Care Research, SWICA Health Organization, Winterthur
  • 2022-today: Post-Doc researcher and lecturer, HSG
  • 2018-2022: PhD Economic and Finance, HSG


Previously:

Introduction: Aurélien Sallin

Research at SWICA

  • Using Real-World Data from claims to assess effectiveness of health technological tools
  • Using (Causal) Machine Learning to evaluate the effect of health policies on doctors’ prescription behaviors
  • Financing models for mandatory health care in Switzerland


Other Research in Economics of Education

  • Missclassification rates for gifted students
  • Evaluation of Special Education programs

Course Structure

Course concept: lectures

  • Lectures (Thursday morning)
    • Background/Concepts
    • Illustration concepts
    • Illustration of ‘hands-on’ approaches

Course concept: special lectures

  • 30.11.2023: Industry Insights
    • Matteo Courthoud, PhD: Senior Economist at Zalando


  • 14.12.2023: Federal Administration Insights
    • Florian Chatagny, PhD: Head of Data Science Team, Federal Finance Administration

Course concept: exercises

  • Exercise sheets (handed out every other week)
    • Some conceptual questions
    • Hands-on exercises/tutorials in R
    • Detailed solution videos
    • First Exercises (set up R/RStudio) is available on StudyNet/Canvas today

The Elefant in the Room


# the symbolic representation of Artificial Intelligence as being the 
# "elefant in the room", comic cartoon style - Variations (Strong) 

Course concept

  • Learning mode in this course: Prepare with reading, visit the lecture, recap key concepts in lecture notes (self-study), work on exercises, watch solution video, come to exercise session, repeat…

  • Strongly encouraged: (virtual) learning groups!

    • Biweekly exercises provide opportunity.
    • Tackle the tricky exercises together!

Course concept: exercise sessions

  • In-class exercise sessions (bi-weekly evening sessions)
    • Discussion of exercises and additional input
    • Recap of concepts
    • Q&A, support
    • time for more coding!

Part I: Data (Science) fundamentals

Date Topic
21.09.2023 Introduction: Big Data/Data Science, course overview
28.09.2023 Programming with R
05.10.2023 An introduction to data and data processing
05.10.2023 Exercises/Workshop 1: Tools, programming
12.10.2023 Data storage and data structures
12.10.2023 Exercises/Workshop 2: Data storage and data structures
19.10.2023 Web data, text, and images
26.10.2023 Data sources, data gathering, data import
26.10.2023 Exercises/Workshop 3: Web data, text, and images

Part II: Data gathering and preparation

Date Topic
16.11.2023 Data preparation and manipulation
23.11.2023 Basic statistics and data analysis with R
23.11.2023 Exercises/Workshop 4: Data gathering, data import
30.11.2023 Guest Lecture: Matteo Courthoud (Senior Economist and Data Scientist @Zalando)

Part III: Analysis, visualisation, output

Date Topic
07.12.2023 Visualisation, dynamic documents
07.12.2023 Exercises/Workshop 5: Data preparation and applied data analysis with R
14.12.2023 Guest Lecture: Florian Chatagny (Head of Data Science @Federal Finance Administration in Bern)
21.12.2023 Exercises/Workshop 6: Visualization, dynamic documents
21.12.2023 Summary, Wrap-Up, Q&A, Feedback
21.12.2023 Exam for Exchange Students

Core course resources

  • All information and materials (notes, slides, course sheet, syllabus, etc.) are available on StudyNet/Canvas.
  • Core materials will also be made available on Nuvolos.

Main textbooks

Further resources

Exam information

  • Central, written examination: digital, BYOD!, we will have an instructional session by the head of the digital examinations team (data TBD).
  • Multiple choice questions.
  • A few open questions.
  • Theoretical concepts and practical applications in R (questions based on code examples).

Exam information II

  • We will release samples of multiple choice questions via Quizzes on Canvas/Studynet (exact same format and style of exam questions).
  • Exchange students who need to take the exam before the central exam block:

And now this…

Q&A

References